Overview

Dataset statistics

Number of variables11
Number of observations6370
Missing cells14644
Missing cells (%)20.9%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory504.0 KiB
Average record size in memory81.0 B

Variable types

Numeric9
Categorical1
Boolean1

Alerts

Date has a high cardinality: 182 distinct values High cardinality
discount_promotional is highly correlated with discount_competitiveHigh correlation
discount_damaged_good is highly correlated with IsHolidayHigh correlation
discount_competitive is highly correlated with discount_promotionalHigh correlation
IsHoliday is highly correlated with discount_damaged_goodHigh correlation
Store is highly correlated with CPI and 1 other fieldsHigh correlation
gas_price is highly correlated with UnemploymentHigh correlation
CPI is highly correlated with Store and 1 other fieldsHigh correlation
Unemployment is highly correlated with Store and 2 other fieldsHigh correlation
discount_promotional has 3146 (49.4%) missing values Missing
discount_clearance has 3870 (60.8%) missing values Missing
discount_damaged_good has 3400 (53.4%) missing values Missing
discount_competitive has 3305 (51.9%) missing values Missing
CPI has 468 (7.3%) missing values Missing
Unemployment has 455 (7.1%) missing values Missing
Date is uniformly distributed Uniform

Reproduction

Analysis started2022-11-05 05:47:28.998892
Analysis finished2022-11-05 05:47:36.964294
Duration7.97 seconds
Software versionpandas-profiling v3.4.0
Download configurationconfig.json

Variables

Store
Real number (ℝ≥0)

HIGH CORRELATION

Distinct35
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean18
Minimum1
Maximum35
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size49.9 KiB
2022-11-04T22:47:37.020308image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q19
median18
Q327
95-th percentile34
Maximum35
Range34
Interquartile range (IQR)18

Descriptive statistics

Standard deviation10.10029777
Coefficient of variation (CV)0.561127654
Kurtosis-1.201962265
Mean18
Median Absolute Deviation (MAD)9
Skewness0
Sum114660
Variance102.0160151
MonotonicityIncreasing
2022-11-04T22:47:37.091350image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=35)
ValueCountFrequency (%)
1182
 
2.9%
27182
 
2.9%
21182
 
2.9%
22182
 
2.9%
23182
 
2.9%
24182
 
2.9%
25182
 
2.9%
26182
 
2.9%
28182
 
2.9%
19182
 
2.9%
Other values (25)4550
71.4%
ValueCountFrequency (%)
1182
2.9%
2182
2.9%
3182
2.9%
4182
2.9%
5182
2.9%
6182
2.9%
7182
2.9%
8182
2.9%
9182
2.9%
10182
2.9%
ValueCountFrequency (%)
35182
2.9%
34182
2.9%
33182
2.9%
32182
2.9%
31182
2.9%
30182
2.9%
29182
2.9%
28182
2.9%
27182
2.9%
26182
2.9%

Date
Categorical

HIGH CARDINALITY
UNIFORM

Distinct182
Distinct (%)2.9%
Missing0
Missing (%)0.0%
Memory size49.9 KiB
2/5/2010
 
35
4/13/2012
 
35
4/27/2012
 
35
5/4/2012
 
35
5/11/2012
 
35
Other values (177)
6195 

Length

Max length10
Median length9
Mean length8.923076923
Min length8

Characters and Unicode

Total characters56840
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2/5/2010
2nd row2/12/2010
3rd row2/19/2010
4th row2/26/2010
5th row3/5/2010

Common Values

ValueCountFrequency (%)
2/5/201035
 
0.5%
4/13/201235
 
0.5%
4/27/201235
 
0.5%
5/4/201235
 
0.5%
5/11/201235
 
0.5%
5/18/201235
 
0.5%
5/25/201235
 
0.5%
6/1/201235
 
0.5%
6/8/201235
 
0.5%
6/15/201235
 
0.5%
Other values (172)6020
94.5%

Length

2022-11-04T22:47:37.166940image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2/5/201035
 
0.5%
7/9/201035
 
0.5%
7/2/201035
 
0.5%
2/19/201035
 
0.5%
2/26/201035
 
0.5%
3/5/201035
 
0.5%
3/12/201035
 
0.5%
3/19/201035
 
0.5%
3/26/201035
 
0.5%
4/2/201035
 
0.5%
Other values (172)6020
94.5%

Most occurring characters

ValueCountFrequency (%)
113335
23.5%
/12740
22.4%
211935
21.0%
09100
16.0%
32590
 
4.6%
51260
 
2.2%
41260
 
2.2%
71225
 
2.2%
61225
 
2.2%
81085
 
1.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number44100
77.6%
Other Punctuation12740
 
22.4%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
113335
30.2%
211935
27.1%
09100
20.6%
32590
 
5.9%
51260
 
2.9%
41260
 
2.9%
71225
 
2.8%
61225
 
2.8%
81085
 
2.5%
91085
 
2.5%
Other Punctuation
ValueCountFrequency (%)
/12740
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common56840
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
113335
23.5%
/12740
22.4%
211935
21.0%
09100
16.0%
32590
 
4.6%
51260
 
2.2%
41260
 
2.2%
71225
 
2.2%
61225
 
2.2%
81085
 
1.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII56840
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
113335
23.5%
/12740
22.4%
211935
21.0%
09100
16.0%
32590
 
4.6%
51260
 
2.2%
41260
 
2.2%
71225
 
2.2%
61225
 
2.2%
81085
 
1.9%

IsHoliday
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size6.3 KiB
False
5915 
True
 
455
ValueCountFrequency (%)
False5915
92.9%
True455
 
7.1%
2022-11-04T22:47:37.240561image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Temperature
Real number (ℝ)

Distinct3759
Distinct (%)59.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean58.60789011
Minimum-7.29
Maximum101.95
Zeros0
Zeros (%)0.0%
Negative4
Negative (%)0.1%
Memory size49.9 KiB
2022-11-04T22:47:37.302426image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum-7.29
5-th percentile26.49
Q145.015
median59.91
Q372.99
95-th percentile87.0155
Maximum101.95
Range109.24
Interquartile range (IQR)27.975

Descriptive statistics

Standard deviation18.72455401
Coefficient of variation (CV)0.3194886214
Kurtosis-0.6143786632
Mean58.60789011
Median Absolute Deviation (MAD)13.96
Skewness-0.2473546907
Sum373332.26
Variance350.608923
MonotonicityNot monotonic
2022-11-04T22:47:37.378519image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
70.289
 
0.1%
67.879
 
0.1%
50.439
 
0.1%
46.288
 
0.1%
47.557
 
0.1%
64.217
 
0.1%
64.287
 
0.1%
59.947
 
0.1%
62.626
 
0.1%
456
 
0.1%
Other values (3749)6295
98.8%
ValueCountFrequency (%)
-7.291
< 0.1%
-6.611
< 0.1%
-6.081
< 0.1%
-2.061
< 0.1%
0.251
< 0.1%
2.321
< 0.1%
2.451
< 0.1%
41
< 0.1%
5.541
< 0.1%
6.231
< 0.1%
ValueCountFrequency (%)
101.952
< 0.1%
100.141
< 0.1%
100.071
< 0.1%
99.662
< 0.1%
99.222
< 0.1%
99.21
< 0.1%
98.431
< 0.1%
98.151
< 0.1%
97.661
< 0.1%
97.61
< 0.1%

gas_price
Real number (ℝ≥0)

HIGH CORRELATION

Distinct943
Distinct (%)14.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.41654898
Minimum2.514
Maximum4.468
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size49.9 KiB
2022-11-04T22:47:37.456696image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum2.514
5-th percentile2.689
Q13.054
median3.524
Q33.749
95-th percentile4.0241
Maximum4.468
Range1.954
Interquartile range (IQR)0.695

Descriptive statistics

Standard deviation0.4278726182
Coefficient of variation (CV)0.1252353239
Kurtosis-0.9312876775
Mean3.41654898
Median Absolute Deviation (MAD)0.292
Skewness-0.329129282
Sum21763.417
Variance0.1830749774
MonotonicityNot monotonic
2022-11-04T22:47:37.540269image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3.41736
 
0.6%
3.63834
 
0.5%
3.6333
 
0.5%
3.58332
 
0.5%
3.6232
 
0.5%
3.22726
 
0.4%
3.66626
 
0.4%
3.61125
 
0.4%
3.62224
 
0.4%
3.38624
 
0.4%
Other values (933)6078
95.4%
ValueCountFrequency (%)
2.51410
0.2%
2.543
 
< 0.1%
2.54810
0.2%
2.553
 
< 0.1%
2.56110
0.2%
2.56510
0.2%
2.57213
0.2%
2.5733
 
< 0.1%
2.5742
 
< 0.1%
2.57710
0.2%
ValueCountFrequency (%)
4.4684
0.1%
4.4494
0.1%
4.3082
< 0.1%
4.3014
0.1%
4.2944
0.1%
4.2932
< 0.1%
4.2882
< 0.1%
4.2822
< 0.1%
4.2774
0.1%
4.2734
0.1%

discount_promotional
Real number (ℝ)

HIGH CORRELATION
MISSING

Distinct3141
Distinct (%)97.4%
Missing3146
Missing (%)49.4%
Infinite0
Infinite (%)0.0%
Mean7832.565958
Minimum-2781.45
Maximum103184.98
Zeros0
Zeros (%)0.0%
Negative2
Negative (%)< 0.1%
Memory size49.9 KiB
2022-11-04T22:47:37.623695image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum-2781.45
5-th percentile370.99
Q12638.645
median5427.955
Q39553.4425
95-th percentile22692.948
Maximum103184.98
Range105966.43
Interquartile range (IQR)6914.7975

Descriptive statistics

Standard deviation9512.725241
Coefficient of variation (CV)1.214509433
Kurtosis23.87362661
Mean7832.565958
Median Absolute Deviation (MAD)3198.88
Skewness4.079904893
Sum25252192.65
Variance90491941.52
MonotonicityNot monotonic
2022-11-04T22:47:37.702366image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
9667.52
 
< 0.1%
29091.042
 
< 0.1%
9649.152
 
< 0.1%
2177.832
 
< 0.1%
2657.142
 
< 0.1%
3634.622
 
< 0.1%
6458.442
 
< 0.1%
3805.932
 
< 0.1%
2005.312
 
< 0.1%
7962.182
 
< 0.1%
Other values (3131)3204
50.3%
(Missing)3146
49.4%
ValueCountFrequency (%)
-2781.451
< 0.1%
-563.91
< 0.1%
0.51
< 0.1%
1.51
< 0.1%
2.51
< 0.1%
2.821
< 0.1%
5.761
< 0.1%
7.821
< 0.1%
8.621
< 0.1%
9.591
< 0.1%
ValueCountFrequency (%)
103184.981
< 0.1%
95102.51
< 0.1%
88750.341
< 0.1%
88646.761
< 0.1%
84139.361
< 0.1%
80498.651
< 0.1%
78124.51
< 0.1%
77017.241
< 0.1%
75522.861
< 0.1%
75149.791
< 0.1%

discount_clearance
Real number (ℝ)

MISSING

Distinct2282
Distinct (%)91.3%
Missing3870
Missing (%)60.8%
Infinite0
Infinite (%)0.0%
Mean3536.943988
Minimum-265.76
Maximum104519.54
Zeros3
Zeros (%)< 0.1%
Negative23
Negative (%)0.4%
Memory size49.9 KiB
2022-11-04T22:47:37.781268image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum-265.76
5-th percentile3.195
Q170.74
median360.28
Q32413.8225
95-th percentile18657.7065
Maximum104519.54
Range104785.3
Interquartile range (IQR)2343.0825

Descriptive statistics

Standard deviation9108.132594
Coefficient of variation (CV)2.575141881
Kurtosis31.31713162
Mean3536.943988
Median Absolute Deviation (MAD)350.79
Skewness4.895291162
Sum8842359.97
Variance82958079.35
MonotonicityNot monotonic
2022-11-04T22:47:37.860335image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1.919
 
0.1%
47
 
0.1%
37
 
0.1%
196
 
0.1%
0.036
 
0.1%
0.016
 
0.1%
115
 
0.1%
95
 
0.1%
5.735
 
0.1%
3.825
 
0.1%
Other values (2272)2439
38.3%
(Missing)3870
60.8%
ValueCountFrequency (%)
-265.761
< 0.1%
-1921
< 0.1%
-35.741
< 0.1%
-15.451
< 0.1%
-10.981
< 0.1%
-10.52
< 0.1%
-9.981
< 0.1%
-9.941
< 0.1%
-7.761
< 0.1%
-7.61
< 0.1%
ValueCountFrequency (%)
104519.541
< 0.1%
97740.991
< 0.1%
92523.941
< 0.1%
89121.941
< 0.1%
82881.161
< 0.1%
72413.711
< 0.1%
71074.171
< 0.1%
70574.851
< 0.1%
59362.31
< 0.1%
58804.911
< 0.1%

discount_damaged_good
Real number (ℝ)

HIGH CORRELATION
MISSING

Distinct2437
Distinct (%)82.1%
Missing3400
Missing (%)53.4%
Infinite0
Infinite (%)0.0%
Mean1946.11564
Minimum-179.26
Maximum149483.31
Zeros1
Zeros (%)< 0.1%
Negative9
Negative (%)0.1%
Memory size49.9 KiB
2022-11-04T22:47:37.938106image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum-179.26
5-th percentile0.8
Q17.7
median44.99
Q3185.2825
95-th percentile1408.6095
Maximum149483.31
Range149662.57
Interquartile range (IQR)177.5825

Descriptive statistics

Standard deviation11850.71185
Coefficient of variation (CV)6.089418127
Kurtosis65.89481834
Mean1946.11564
Median Absolute Deviation (MAD)42.59
Skewness7.765311128
Sum5779963.45
Variance140439371.4
MonotonicityNot monotonic
2022-11-04T22:47:38.013670image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
115
 
0.2%
314
 
0.2%
211
 
0.2%
410
 
0.2%
0.610
 
0.2%
69
 
0.1%
0.58
 
0.1%
0.38
 
0.1%
1.28
 
0.1%
0.227
 
0.1%
Other values (2427)2870
45.1%
(Missing)3400
53.4%
ValueCountFrequency (%)
-179.261
< 0.1%
-89.11
< 0.1%
-44.541
< 0.1%
-29.11
< 0.1%
-23.971
< 0.1%
-14.291
< 0.1%
-2.581
< 0.1%
-11
< 0.1%
-0.861
< 0.1%
01
< 0.1%
ValueCountFrequency (%)
149483.311
< 0.1%
146394.441
< 0.1%
141630.611
< 0.1%
139621.511
< 0.1%
130129.111
< 0.1%
115048.811
< 0.1%
112255.671
< 0.1%
109030.751
< 0.1%
105691.671
< 0.1%
105146.31
< 0.1%

discount_competitive
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct2959
Distinct (%)96.5%
Missing3305
Missing (%)51.9%
Infinite0
Infinite (%)0.0%
Mean3330.461077
Minimum0.46
Maximum67474.85
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size49.9 KiB
2022-11-04T22:47:38.093676image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0.46
5-th percentile27.022
Q1354.12
median1201.3
Q33338.11
95-th percentile12861.768
Maximum67474.85
Range67474.39
Interquartile range (IQR)2983.99

Descriptive statistics

Standard deviation6824.986916
Coefficient of variation (CV)2.049261877
Kurtosis28.98549732
Mean3330.461077
Median Absolute Deviation (MAD)1060.72
Skewness4.878394693
Sum10207863.2
Variance46580446.4
MonotonicityNot monotonic
2022-11-04T22:47:38.172335image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
94
 
0.1%
33
 
< 0.1%
83
 
< 0.1%
813.773
 
< 0.1%
2.53
 
< 0.1%
149.772
 
< 0.1%
496.242
 
< 0.1%
262.812
 
< 0.1%
433.382
 
< 0.1%
1534.42
 
< 0.1%
Other values (2949)3039
47.7%
(Missing)3305
51.9%
ValueCountFrequency (%)
0.461
 
< 0.1%
0.631
 
< 0.1%
0.661
 
< 0.1%
0.871
 
< 0.1%
1.881
 
< 0.1%
1.921
 
< 0.1%
1.941
 
< 0.1%
22
< 0.1%
2.53
< 0.1%
2.521
 
< 0.1%
ValueCountFrequency (%)
67474.851
< 0.1%
65344.641
< 0.1%
63130.811
< 0.1%
60065.821
< 0.1%
57817.561
< 0.1%
57815.431
< 0.1%
56735.251
< 0.1%
56600.971
< 0.1%
53603.991
< 0.1%
52850.81
< 0.1%

CPI
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct2089
Distinct (%)35.4%
Missing468
Missing (%)7.3%
Infinite0
Infinite (%)0.0%
Mean171.3886234
Minimum126.064
Maximum228.9764563
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size49.9 KiB
2022-11-04T22:47:38.370303image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum126.064
5-th percentile126.6019032
Q1132.5461333
median142.4460738
Q3214.7027646
95-th percentile225.1732488
Maximum228.9764563
Range102.9124563
Interquartile range (IQR)82.1566313

Descriptive statistics

Standard deviation40.24166294
Coefficient of variation (CV)0.2347977488
Kurtosis-1.822368068
Mean171.3886234
Median Absolute Deviation (MAD)16.3100093
Skewness0.1627910466
Sum1011535.655
Variance1619.391436
MonotonicityNot monotonic
2022-11-04T22:47:38.446099image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
132.716096824
 
0.4%
139.122612921
 
0.3%
224.802531412
 
0.2%
143.22007379
 
0.1%
228.72986389
 
0.1%
126.55228579
 
0.1%
126.49625819
 
0.1%
126.44206459
 
0.1%
126.52628579
 
0.1%
201.07057129
 
0.1%
Other values (2079)5782
90.8%
(Missing)468
 
7.3%
ValueCountFrequency (%)
126.0648
0.1%
126.07664528
0.1%
126.08545168
0.1%
126.08929038
0.1%
126.10193558
0.1%
126.10690328
0.1%
126.11190328
0.1%
126.1148
0.1%
126.11458068
0.1%
126.12668
0.1%
ValueCountFrequency (%)
228.97645633
 
< 0.1%
228.88924821
 
< 0.1%
228.80204011
 
< 0.1%
228.77966823
 
< 0.1%
228.72986389
0.1%
228.7148321
 
< 0.1%
228.69264561
 
< 0.1%
228.64288823
 
< 0.1%
228.62762391
 
< 0.1%
228.6056231
 
< 0.1%

Unemployment
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct352
Distinct (%)6.0%
Missing455
Missing (%)7.1%
Infinite0
Infinite (%)0.0%
Mean7.759504818
Minimum3.684
Maximum14.313
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size49.9 KiB
2022-11-04T22:47:38.522456image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum3.684
5-th percentile5.277
Q16.617
median7.767
Q38.513
95-th percentile10.581
Maximum14.313
Range10.629
Interquartile range (IQR)1.896

Descriptive statistics

Standard deviation1.7808631
Coefficient of variation (CV)0.2295073128
Kurtosis3.012236015
Mean7.759504818
Median Absolute Deviation (MAD)0.9
Skewness1.15741994
Sum45897.471
Variance3.17147338
MonotonicityNot monotonic
2022-11-04T22:47:38.601308image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
8.09978
 
1.2%
7.85270
 
1.1%
7.93165
 
1.0%
6.89165
 
1.0%
7.05765
 
1.0%
7.44165
 
1.0%
6.56565
 
1.0%
6.1765
 
1.0%
6.23760
 
0.9%
8.02857
 
0.9%
Other values (342)5260
82.6%
(Missing)455
 
7.1%
ValueCountFrequency (%)
3.6844
 
0.1%
3.87913
0.2%
3.8964
 
0.1%
3.92113
0.2%
3.93213
0.2%
4.07713
0.2%
4.12513
0.2%
4.14513
0.2%
4.15613
0.2%
4.26113
0.2%
ValueCountFrequency (%)
14.31328
0.4%
14.1826
0.4%
14.09926
0.4%
14.02124
0.4%
13.97516
0.3%
13.73626
0.4%
13.50328
0.4%
12.8926
0.4%
12.18726
0.4%
11.62726
0.4%

Interactions

2022-11-04T22:47:35.923395image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:30.611635image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:31.325884image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:31.937459image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:32.604473image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:33.324704image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:33.942682image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:34.592882image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:35.238641image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:35.991119image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:30.689755image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:31.395671image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:32.011262image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:32.682107image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:33.396437image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:34.015635image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:34.664327image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:35.306167image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:36.056142image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:30.756010image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:31.459760image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:32.082563image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:32.750457image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:33.464025image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:34.085845image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:34.734157image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:35.371106image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:36.128993image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:30.829658image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:31.531910image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:32.159891image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:32.828833image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:33.533689image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:34.162651image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:34.811372image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:35.442048image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:36.194554image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:30.897259image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:31.597951image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:32.230029image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:32.897734image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:33.601604image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:34.233918image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:34.882291image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:35.506356image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:36.258901image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:30.964732image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:31.663870image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:32.296733image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:32.968140image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:33.669146image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:34.304732image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:34.952983image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:35.570937image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:36.328870image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:31.120184image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:31.737012image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:32.370747image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:33.042494image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:33.740857image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:34.379464image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:35.028548image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:35.640614image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:36.399565image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:31.193012image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:31.808116image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:32.450817image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:33.117115image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:33.812270image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:34.456049image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:35.102348image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:35.798780image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:36.463990image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:31.258355image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:31.871311image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:32.524446image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:33.258782image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:33.877398image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:34.524340image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:35.170068image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-04T22:47:35.859056image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Correlations

2022-11-04T22:47:38.670333image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Auto

The auto setting is an easily interpretable pairwise column metric of the following mapping: vartype-vartype : method, categorical-categorical : Cramer's V, numerical-categorical : Cramer's V (using a discretized numerical column), numerical-numerical : Spearman's ρ. This configuration uses the best suitable for each pair of columns.
2022-11-04T22:47:38.765803image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-11-04T22:47:38.863229image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-11-04T22:47:38.962453image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-11-04T22:47:39.061071image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-11-04T22:47:36.588476image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
A simple visualization of nullity by column.
2022-11-04T22:47:36.717669image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-11-04T22:47:36.829148image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2022-11-04T22:47:36.909412image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

StoreDateIsHolidayTemperaturegas_pricediscount_promotionaldiscount_clearancediscount_damaged_gooddiscount_competitiveCPIUnemployment
012/5/2010False59.333.3609667.50268.290.608368.15223.6591146.833
112/12/2010True51.653.4098687.471594.872.202144.87223.7536436.833
212/19/2010False52.393.5102706.873128.741.882396.68223.9170156.833
312/26/2010False60.123.5556129.281802.84NaN301.48224.1320206.833
413/5/2010False61.653.6303552.58601.32NaN2666.22224.3470256.833
513/12/2010False60.713.6698368.50298.461.391340.29224.5620296.833
613/19/2010False64.003.7342588.0680.893.20909.76224.7166956.833
713/26/2010False66.533.787737.0836.101.4392.11224.7909106.833
814/2/2010False69.363.845825.1099.831.25113.25224.8651256.833
914/9/2010False73.013.8914433.71NaN24.451282.30224.9393406.664

Last rows

StoreDateIsHolidayTemperaturegas_pricediscount_promotionaldiscount_clearancediscount_damaged_gooddiscount_competitiveCPIUnemployment
6360355/24/2013False65.193.6273116.05598.90471.121030.77NaNNaN
6361355/31/2013False64.653.6464542.19286.2913.8036.90NaNNaN
6362356/7/2013False69.533.63314769.81154.1253.596333.27NaNNaN
6363356/14/2013False67.643.6326636.89159.07313.372895.53NaNNaN
6364356/21/2013False70.763.6263252.39118.0265.024722.43NaNNaN
6365356/28/2013False77.343.6394764.5585.65NaN4660.01NaNNaN
6366357/5/2013False77.413.6146333.87138.09610.357224.45NaNNaN
6367357/12/2013False80.743.6144798.18130.8822.052351.60NaNNaN
6368357/19/2013False83.363.7372318.53122.3023.45771.35NaNNaN
6369357/26/2013False77.013.804237.9650.004.0023.64NaNNaN